Pesquisa | Portal Regional da BVS

DIVE: a reference-free statistical approach to diversity-generating and mobile genetic element discovery.

Abante, Jordi; Wang, Peter L; Salzman, Julia.

Genome Biol ; 24(1): 240, 2023 10 20.

Artigo em Inglês | MEDLINE | ID: mdl-37864197

RESUMO

Diversity-generating and mobile genetic elements are key to microbial and viral evolution and can result in evolutionary leaps. State-of-the-art algorithms to detect these elements have limitations. Here, we introduce DIVE, a new reference-free approach to overcome these limitations using information contained in sequencing reads alone. We show that DIVE has improved detection power compared to existing reference-based methods using simulations and real data. We use DIVE to rediscover and characterize the activity of known and novel elements and generate new biological hypotheses about the mobilome. Building on DIVE, we develop a reference-free framework capable of de novo discovery of mobile genetic elements.

Assuntos

Transferência Genética Horizontal , Sequências Repetitivas Dispersas , Elementos de DNA Transponíveis

DNA methylation entropy is associated with DNA sequence features and developmental epigenetic divergence.

Fang, Yuqi; Ji, Zhicheng; Zhou, Weiqiang; Abante, Jordi; Koldobskiy, Michael A; Ji, Hongkai; Feinberg, Andrew P.

Nucleic Acids Res ; 51(5): 2046-2065, 2023 03 21.

Artigo em Inglês | MEDLINE | ID: mdl-36762477

RESUMO

Epigenetic information defines tissue identity and is largely inherited in development through DNA methylation. While studied mostly for mean differences, methylation also encodes stochastic change, defined as entropy in information theory. Analyzing allele-specific methylation in 49 human tissue sample datasets, we find that methylation entropy is associated with specific DNA binding motifs, regulatory DNA, and CpG density. Then applying information theory to 42 mouse embryo methylation datasets, we find that the contribution of methylation entropy to time- and tissue-specific patterns of development is comparable to the contribution of methylation mean, and methylation entropy is associated with sequence and chromatin features conserved with human. Moreover, methylation entropy is directly related to gene expression variability in development, suggesting a role for epigenetic entropy in developmental plasticity.

Assuntos

Metilação de DNA , Epigênese Genética , Humanos , Animais , Camundongos , Metilação de DNA/genética , Entropia , Ilhas de CpG/genética , DNA/genética

Estimating DNA methylation potential energy landscapes from nanopore sequencing data.

Abante, Jordi; Kambhampati, Sandeep; Feinberg, Andrew P; Goutsias, John.

Sci Rep ; 11(1): 21619, 2021 11 03.

Artigo em Inglês | MEDLINE | ID: mdl-34732768

RESUMO

High-throughput third-generation nanopore sequencing devices have enormous potential for simultaneously observing epigenetic modifications in human cells over large regions of the genome. However, signals generated by these devices are subject to considerable noise that can lead to unsatisfactory detection performance and hamper downstream analysis. Here we develop a statistical method, CpelNano, for the quantification and analysis of 5mC methylation landscapes using nanopore data. CpelNano takes into account nanopore noise by means of a hidden Markov model (HMM) in which the true but unknown ("hidden") methylation state is modeled through an Ising probability distribution that is consistent with methylation means and pairwise correlations, whereas nanopore current signals constitute the observed state. It then estimates the associated methylation potential energy function by employing the expectation-maximization (EM) algorithm and performs differential methylation analysis via permutation-based hypothesis testing. Using simulations and analysis of published data obtained from three human cell lines (GM12878, MCF-10A, and MDA-MB-231), we show that CpelNano can faithfully estimate DNA methylation potential energy landscapes, substantially improving current methods and leading to a powerful tool for the modeling and analysis of epigenetic landscapes using nanopore sequencing data.

Assuntos

Algoritmos , Neoplasias da Mama/genética , Metilação de DNA , Epigênese Genética , Linfócitos/metabolismo , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Neoplasias da Mama/patologia , Células Cultivadas , Feminino , Genoma Humano , Humanos

Converging genetic and epigenetic drivers of paediatric acute lymphoblastic leukaemia identified by an information-theoretic analysis.

Koldobskiy, Michael A; Jenkinson, Garrett; Abante, Jordi; Rodriguez DiBlasi, Varenka A; Zhou, Weiqiang; Pujadas, Elisabet; Idrizi, Adrian; Tryggvadottir, Rakel; Callahan, Colin; Bonifant, Challice L; Rabin, Karen R; Brown, Patrick A; Ji, Hongkai; Goutsias, John; Feinberg, Andrew P.

Nat Biomed Eng ; 5(4): 360-376, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33859388

RESUMO

In cancer, linking epigenetic alterations to drivers of transformation has been difficult, in part because DNA methylation analyses must capture epigenetic variability, which is central to tumour heterogeneity and tumour plasticity. Here, by conducting a comprehensive analysis, based on information theory, of differences in methylation stochasticity in samples from patients with paediatric acute lymphoblastic leukaemia (ALL), we show that ALL epigenomes are stochastic and marked by increased methylation entropy at specific regulatory regions and genes. By integrating DNA methylation and single-cell gene-expression data, we arrived at a relationship between methylation entropy and gene-expression variability, and found that epigenetic changes in ALL converge on a shared set of genes that overlap with genetic drivers involved in chromosomal translocations across the disease spectrum. Our findings suggest that an epigenetically driven gene-regulation network, with UHRF1 (ubiquitin-like with PHD and RING finger domains 1) as a central node, links genetic drivers and epigenetic mediators in ALL.

Assuntos

Epigênese Genética , Modelos Teóricos , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Proteínas Estimuladoras de Ligação a CCAAT/genética , Criança , Subunidade alfa 2 de Fator de Ligação ao Core/genética , Análise Citogenética , Metilação de DNA , Entropia , Edição de Genes , Regulação Neoplásica da Expressão Gênica , Humanos , Proteínas de Fusão Oncogênica/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/patologia , RNA-Seq , Análise de Célula Única , Processos Estocásticos , Ubiquitina-Proteína Ligases/genética

A Dysregulated DNA Methylation Landscape Linked to Gene Expression in MLL-Rearranged AML.

Koldobskiy, Michael A; Abante, Jordi; Jenkinson, Garrett; Pujadas, Elisabet; Tetens, Ashley; Zhao, Feifei; Tryggvadottir, Rakel; Idrizi, Adrian; Reinisch, Andreas; Majeti, Ravindra; Goutsias, John; Feinberg, Andrew P.

Epigenetics ; 15(8): 841-858, 2020 08.

Artigo em Inglês | MEDLINE | ID: mdl-32114880

RESUMO

Translocations of the KMT2A (MLL) gene define a biologically distinct and clinically aggressive subtype of acute myeloid leukaemia (AML), marked by a characteristic gene expression profile and few cooperating mutations. Although dysregulation of the epigenetic landscape in this leukaemia is particularly interesting given the low mutation frequency, its comprehensive analysis using whole genome bisulphite sequencing (WGBS) has not been previously performed. Here we investigated epigenetic dysregulation in nine MLL-rearranged (MLL-r) AML samples by comparing them to six normal myeloid controls, using a computational method that encapsulates mean DNA methylation measurements along with analyses of methylation stochasticity. We discovered a dramatically altered epigenetic profile in MLL-r AML, associated with genome-wide hypomethylation and a markedly increased DNA methylation entropy reflecting an increasingly disordered epigenome. Methylation discordance mapped to key genes and regulatory elements that included bivalent promoters and active enhancers. Genes associated with significant changes in methylation stochasticity recapitulated known MLL-r AML expression signatures, suggesting a role for the altered epigenetic landscape in the transcriptional programme initiated by MLL translocations. Accordingly, we established statistically significant associations between discordances in methylation stochasticity and gene expression in MLL-r AML, thus providing a link between the altered epigenetic landscape and the phenotype.

Assuntos

Metilação de DNA , Regulação Neoplásica da Expressão Gênica , Leucemia Aguda Bifenotípica/genética , Leucemia Mieloide Aguda/genética , Epigênese Genética , Histona-Lisina N-Metiltransferase/genética , Humanos , Leucemia Aguda Bifenotípica/metabolismo , Leucemia Mieloide Aguda/metabolismo , Proteína de Leucina Linfoide-Mieloide/genética , Transcriptoma , Translocação Genética

Ranking genomic features using an information-theoretic measure of epigenetic discordance.

Jenkinson, Garrett; Abante, Jordi; Koldobskiy, Michael A; Feinberg, Andrew P; Goutsias, John.

BMC Bioinformatics ; 20(1): 175, 2019 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-30961526

RESUMO

BACKGROUND: Establishment and maintenance of DNA methylation throughout the genome is an important epigenetic mechanism that regulates gene expression whose disruption has been implicated in human diseases like cancer. It is therefore crucial to know which genes, or other genomic features of interest, exhibit significant discordance in DNA methylation between two phenotypes. We have previously proposed an approach for ranking genes based on methylation discordance within their promoter regions, determined by centering a window of fixed size at their transcription start sites. However, we cannot use this method to identify statistically significant genomic features and handle features of variable length and with missing data. RESULTS: We present a new approach for computing the statistical significance of methylation discordance within genomic features of interest in single and multiple test/reference studies. We base the proposed method on a well-articulated hypothesis testing problem that produces p- and q-values for each genomic feature, which we then use to identify and rank features based on the statistical significance of their epigenetic dysregulation. We employ the information-theoretic concept of mutual information to derive a novel test statistic, which we can evaluate by computing Jensen-Shannon distances between the probability distributions of methylation in a test and a reference sample. We design the proposed methodology to simultaneously handle biological, statistical, and technical variability in the data, as well as variable feature lengths and missing data, thus enabling its wide-spread use on any list of genomic features. This is accomplished by estimating, from reference data, the null distribution of the test statistic as a function of feature length using generalized additive regression models. Differential assessment, using normal/cancer data from healthy fetal tissue and pediatric high-grade glioma patients, illustrates the potential of our approach to greatly facilitate the exploratory phases of clinically and biologically relevant methylation studies. CONCLUSIONS: The proposed approach provides the first computational tool for statistically testing and ranking genomic features of interest based on observed DNA methylation discordance in comparative studies that accounts, in a rigorous manner, for biological, statistical, and technical variability in methylation data, as well as for variability in feature length and for missing data.

Assuntos

Epigênese Genética , Epigenômica , Genômica , Metilação de DNA , Genoma Humano , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Probabilidade

An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John.

BMC Bioinformatics ; 19(1): 87, 2018 03 07.

Artigo em Inglês | MEDLINE | ID: mdl-29514626

RESUMO

BACKGROUND: DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. RESULTS: We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. CONCLUSIONS: This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.

Assuntos

Teoria da Informação , Modelos Teóricos , Estatística como Assunto , Sulfitos/química , Sequenciamento Completo do Genoma/métodos , Sequência de Bases , Simulação por Computador , Ilhas de CpG/genética , Metilação de DNA/genética , Entropia , Epigênese Genética , Ontologia Genética , Genoma Humano , Humanos , Neoplasias Pulmonares/genética , Probabilidade , Navegador

HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment.

Abante, Jordi; Ghaffari, Noushin; Johnson, Charles D; Datta, Aniruddha.

BMC Genomics ; 18(1): 694, 2017 Sep 05.

Artigo em Inglês | MEDLINE | ID: mdl-28874136

RESUMO

BACKGROUND: The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. METHODS: Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. RESULTS: Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. CONCLUSIONS: Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis.

Assuntos

Genômica/métodos , Cadeias de Markov , Algoritmos , Teorema de Bayes , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA